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IN THE CLAIMS 

1 . (Currently amended) A method of optimizing data mining in a computer, the data mining 
being performed by the computer to detect one or more outliers in a high dimensional data set of 
personal attributes stored on a data storage device coupled to the computer, the method comprising 
the steps of: 

determining one or more subsets of dimensions and corresponding ranges in the data set 
which are sparse in density using an algorithm capable of utilizing at least one of the processes of 
solution recombination, selection and mutation over a population of multiple solutions; and 

detennining one or more data points in the data set which contain these subsets of dimensions 
and corresponding ranges, the one or more data points being identified as the one or more outliers 
in the data set; 

wherein the one or more outliers in the high dimensional data set of personal attributes are 
capable of being presented on a display . 

2 . (Original) The method of claim 1 , wherein a range is defined as a set of contiguous values 
on a given dimension. 

3. (Original) The method of claim l s wherein the sets of dimensions and corresponding 
ranges in which the data is sparse in density is quantified by a sparsity coefficient measure. 

4 . (Original) The m ethod of claim 3, wherein the sparsity coefficient measure S(D) is defined 



the fraction of data points in each range, TV is the total number of data points in the data set, and n(D) 
is the number of data points in a set of dim ensions £>. 



n(D)-N*f k 



= , where k represents the number of dimensions in the data set,/ represents 



as 
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5. (Original) The method of claim 3, wherein a given sparsity coefficient measure is 
inversely proportional to the number of data points in a given set of dimensions and corresponding 
ranges. 

6. (Original) The method of claim 1 ? wherein a set of dimensions is determined using an 
algorithm which uses the processes of solution recombination, selection and mutation over a 
population of multiple solutions. 

7. (Original) The method of claim 6, wherein the process of solution recombination 
comprises combining characteristics of two solutions in order to create two new solutions. 

8. (Original) The method of claim 6, wherein the process of mutation comprises changing 
a particular characteristic of a solution in order to result in a new solution, 

9. (Original) The method of claim 6 5 wherein the process of selection comprises biasing the 
population in order to favor solutions which are more optimum. 

1 0. (Currently amended) A method of optimizing datamining in a computer, the data mining 
being performed by the computer to detect one or more outliers in a high dimensional data set of 
personal attributes stored on a data storage device coupled to the computer, the method comprising 
the steps of: 

identifying and mining one or more sub-patterns in the data set which have abnormally low 
presence not due to randomness using an algorithm capable of utilizing at least one of the processes 
of solution recombination, selection and mutation over a population of multiple solutions; and 

identifying one or more records which have the one or more sub-patterns present in them as 
the one or more outliers; 

wherein the one or more outliers in the high dimensional data set of personal attributes are 
capable of being presented on a display . 
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11. (Currently amended) Apparatus for optimizing data mining to detect one or more outliers 
in a high dimensional data set comprising: 

a computer having a memory and a data storage device coupled thereto, wherein the data 
storage device stores a data store, the data store having a high dimensional data set of personal 
attributes; and 

one or more computer programs, performed by the computer, for: (i) determining one or 
more subsets of dimensions and corresponding ranges in the data set which are sparse in density 
using an algorithm capable of utilizing at least one of the processes of solution recombination, 
selecti on and mutation over a population of multiple solutions; and (ii) determining one or more data 
points in the data set which contain these subsets of dimensions and corresponding ranges, the one 
or more data points being identified as the one or more outliers in the data set; 

wherein the one or more outliers in the high dimensional data set of personal attributes are 
capable of being presented on a display . 

12. (Original) The apparatus of claim 1 1 , wherein a range is defined as a set of contiguous 
values on a given dimension. 

1 3. (Original) The apparatus of claim 1 1 , wherein the sets of dimensions and corresponding 
ranges in which the data is sparse in density is quantified by a sparsity coefficient measure, 

14. (Original) The apparatus of claim 13 ? wherein the sparsity coefficient measure S(D) is 



represents the fraction of data points in each range, N is th e total number of data points in the data 
set, and n(D) is the number of data points in a set of dimensions D. 



defined as 



n(D)-N* f k 



where k represents the number of dimensions in the data set, / 
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15. (Original) The apparatus of claim .13, wherein a given sparsity coefficient measure is 
inversely proportional to the number of data points in a given set of dimensions and corresponding 
ranges. 

16. (Original) The apparatus of claim 1 1, wherein a set of dimensions is determined using 
an algorithm which uses the processes of solution recombination, selection and mutation over a 
population of multiple solutions. 

17. (Original) The apparatus of claim 16, wherein the process of solution recombination 
comprises combining characteristics of two solutions in order to create two new solutions. 

18. (Original) The apparatus of claim 16, wherein the process of mutation comprises 
changing a particular characteristic of a solution in order to result in a new solution. 

1 9. (Original) The apparatus of claim 1 6, wherein the process of selection comprises biasing 
the population in order to favor solutions which are more optimum. 

20. (Currently amended) Apparatus for optimizing data mining to detect one or more outliers 
in a high dimensional data set comprising: 

a computer having a memory and a data storage device coupled thereto, wherein the data 
storage device stores a data store, the data store having a high dimensional data set of personal 
attributes; and 

one or more computer programs, performed by the computer for: (i) identifying and mining 
one or more sub-patterns in the data set which have abnormally low presence not due to randomness 
using an algorithm capable of utilizing at least one of the processes of solution recombination, 
selection and mutation over a population of multiple solutions; and (ii) identifying one or more 
records which have the one or more sub-patterns present in them as the one or more outliers: 
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wherein the one or more outliers in the hi gh dimensional data set: of persona] attributes are 
capable of beinp: presented on a display . 

21. (Currently amended) An article of manufacture comprising a program storage medium 
readable by a computer and embodying one or more instructions executable by the computer to 
perform method steps for optimizing data mining, the data mining being performed by the computer 
to detect one or more outliers in a high dimensional data set of personal attributes stored on a data 
storage device coupled to the computer, the method comprising the steps of: 

determining one or more subsets of dimensions and corresponding ranges m the data set 
which are sparse in density using an algorithm capable of utilizing at least one of the processes of 
solution recombination, selection and mutation over a population of multiple solutions; and 

determining one or more datapoints in the data set which contain these subsets of dimensions 
and corresponding ranges, the one or more data points being identified as the one or more outliers 
in the data set; 

wherein the one or more outliers in the high dimensional data set of personal attributes are 
capable of being presented on a display . 

22. (Original) The article of claim 21, wherein a range is defined as a set of contiguous 
values on a given dimension. 

23. (Original) The article of claim 21, wherein the sets of dimensions and corresponding 
ranges in which the data is sparse in density is quantified by a sparsity coefficient measure. 

24. (Original) The article of claim 23, wherein the sparsity coefficient measure S(D) is 

n(D)-N*f k 

defined as { , where k represents the number of dimensions in the data set f 

^N*f k *(\-f k ) 

represents the fraction of data points in each range, AT is the total number of data points in the data 

set, and n(D) is the number of data points in a set of dimensions D. 
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25. (Original) The article of claim 23, wherein a given sparsity coefficient measure is 
inversely proportional to the number of data points in a given set of dimensions and corresponding 
ranges. 

26. (Original) The article of claim 21 , wherein a set of dimensions is determined using an 
algorithm which uses the processes of solution recombination, selection and mutation over a 
population of multiple solutions. 

27. (Original) The article of claim 26, wherein the process of solution recombination 
comprises combining characteristics of two solutions in order to create two new solutions, 

28. (Original) The article of claim 26, wherein the process of mutation comprises changing 
a particular characteristic of a solution in order to result in a new solution. 

29. (Original) The article of claim 26, wherein the process of selection comprises biasing 
the population in order to favor solutions which are more optimum. 

30. (Currently amended) An article of manufacture comprising a program storage medium 
readable by a computer and embodying one or more instructions executable by the computer to 
p erform method steps for optimizing data mining, the data mining b eing performed by the computer 
to detect one or more outliers in a high dimensional data set of personal attributes stored on a data 
storage device coupled to the computer, the method comprising the steps of: 

identifying and mining one or more sub-patterns in the data set which have abnormally low 
presence not due to randomness using an algorithm capable of utilizing at least one of the processes 
of solution recombination, selection and mutation over a population of multiple solutions; and 

identifying one or more records which have the one or more sub-patterns present in them as 
the one or more outliers; 
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wherein the one or more ou tliers in the high dimensional data set of personal attributes are 
capable of being presented on a display . 
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